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' PACS 89 . 20 . -a - Interdisciplinary applications of physics 

, PACS 89 . 75 . He - Networks and genealogical trees 

■ PACS 89 . 75 . Da - Systems obeying scaling laws 

I Abstract - We study the game of go from a complex network perspective. We construct a 

^j^. directed network using a suitable definition of tactical moves including local patterns, and study 

' this network for different datasets of professional and amateur games. The move distribution 

QQ , follows Zipf 's law and the network is scale free, with statistical peculiarities different from other 

,—1 ■ real directed networks, such as e. g. the World Wide Web. These specificities reflect in the outcome 

of ranking algorithms applied to it. The fine study of the eigenvalues and eigenvectors of matrices 
used by the ranking algorithms singles out certain strategic situations. Our results should pave 
the way to a better modelization of board games and other types of human strategic scheming. 



' Introduction. — The study of complex networks has 
attracted increasing interest in the past decade, fueled in 
— .particular by the great recent development of communi- 
cation and information networks. Tools and models have 

C^l been created, enabling to describe the growth mechanisms 
and properties of such systems. In parallel, it has been re- 
alized that many important aspects of the physical world 

^""'or of social interactions can be modelized by such net- 
works. Such tools have been applied to many fields of 
t> human activity, such as e.g. languages or friendships [1]. 

kj> • One of the oldest activities of human beings, board 
games have been played for millenia. Besides their in- 
Cdtrinsic interest, they represent a privileged approach to 
the working of decision-making in the human brain. Some 
of them are very difficult to modelize or simulate: only 
recently were computer programs able to beat world chess 
champions. The old Asian game of go is even less 
tractable. The game complexity, that is, the total num- 
ber of legal positions, is about 10^^^, compared to a mere 
10^° for chess [2|. It remains an open challenge for com- 
puter scientists: while Deep Blue famously beat the world 
chess champion Kasparov in 1997, no computer program 
has beaten a very good player even in recent times. 

Many studies have been devoted to "computer go" , the 
simulation of the go game on a computer. They were 
historically based on deterministic tree search algorithms 
such as minimax or alpha-beta, which estimate an evalua- 
tion function (giving the game-theoretic value of a move) 



on the tree of allowed moves (see e. g. [Sill]). Recently, 
much progress has been done by introducing Monte-Carlo 
techniques [SJin], which basically estimate the value of a 
move by playing subsequent moves at random according 
to the rules of go. Monte- Carlo tree search algorithms are 
based on a non-uniform probability distribution over legal 
moves, and explore only the most promising ones. Vari- 
ations on these techniques allowed computer programs to 
make significant progress in the last few years, so that 
professional human players with a large enough handicap 
could be beaten by a computer [7] . The choice of the eval- 
uation function and the way in which the tree is explored 
are crucial ingredients for any further progress. 

Since go is a popular game with millions of players in 
the world, many games have been recorded, which en- 
ables statistical data to be extracted reliably. A few 
works have used statistical properties of recorded profes- 
sional games to optimize performances of Monte-Carlo al- 
gorithms. Usually the simplest features of real games are 
retained, such as local patterns or contiguity to the previ- 
ous move [5]; including more real-game features improves 
noticeably the winning rate of computer programs [5]. 

In this paper, we thus study the game of go from a 
complex network perspective. We use databases of expert 
games in order to construct networks from the different 
sequences of moves, and study the properties of these net- 
works. We based our numerical results on the whole avail- 
able record, from 1941 onwards, of the most important his- 
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torical professional Japanese go tournaments: Kisei (143 
games), Meijin (259 games), Honinbo (305 games), Ju- 
dan (158 games) [TO]. To increase statistics and compare 
with professional tournaments, 4000 amateur games also 
available from |T^ were used. 

Definition of inequivalent moves. The game of 
go is played by two players (Black and White) on a board 
(goban) consisting of 19 horizontal and 19 vertical lines. 
The players alternately place a stone of their own color 
at an empty intersection on the board. Stones entirely 
surrounded by the opponent must be removed, and the 
aim of the game is to delimit large territories. As the game 
unfolds, local and global properties of stones are involved. 
A network approach will obviously not be able to capture 
all features of the game, as the number of possible moves 
is far too large. Here we follow an approach where only 
local features are retained. This approach is reminiscent 
of the one used in the context of language networks |llj . 

A move consists in placing a stone at an empty inter- 
section {h,v) with 1 < h,v < 19. We call "plaquette" 
a square of 3 x 3 intersections, that is, a subset of the 
board of the form {{h + r,v + s), —1 < r,s < 1} (to ac- 
count for edges and corners of the board one can imagine 
that there are two additional dummy lines at each side of 
the board). To define our network we only take into ac- 
count intersections closest to {h,v). Vertices correspond 
to the different kinds of plaquettes in which a player can 
put a stone, irrespective of where it has been played on 
the board. Since each of the 8 neighboring intersections 
can be either empty, black or white, there are ^ 3^ differ- 
ent plaquettes. We choose to consider identical plaquettes 
that transform to each other under any symmetry of the 
square (rotation or flip). We also identify patterns with 
color swapped. That is, a move where Black plays in a 
given plaquette is considered the same as a move where 
White plays in the same plaquette with colors swapped. 
An exact computation taking into account borders and 
symmetries leaves us with 1107 nonequivalent plaquettes 
with empty centers, which are the vertices of our network. 
We note that certain computer programs based on knowl- 
edge from real professional games also consider similar 
3x3 stone patterns [HUH]- Considering larger plaquettes 
is possible and would convey more relevant information; 
however, the number of vertices then becomes enormously 
large (« 3.10^" for 5 x 5 plaquettes). 

This definition of inequivalent moves enables us to in- 
vestigate the first properties of the databases in term of 
frequencies of moves. Zipf's law is an empirical charac- 
teristics which has been observed in many natural dis- 
tributions, such as e. g. word frequency in the English 
language [12], city sizes [13], income distribution of com- 
panies [14], and chess openings [15]. If items are ranked 
according to their frequency, it predicts a power-law de- 
cay of the frequency as a function of the rank. Zipf's law 
was observed in the frequency distribution of 5 x 5 go pat- 
terns [TO] . In Fig. [T] we display the integrated frequency 
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Fig. 1: (Color online) Normalized integrated frequency distri- 
bution of moves F{n) for Honinbo (black), Meijin (red), Judan 
(green), Kisei (blue) and amateur (violet) tournaments. The 
normalized number of occurrences of the 500 most frequent 
moves (among the 1107 moves described in the text) is shown 
vs the ranks of the moves (rankings slightly depend on the 
database). Slopes are resp. -1.058, -1.056, -1.065, -1.067, - 
1.081. Thick dashed line is y = —x. Inset: same with moves 
defined as position of the stone on the board. Log. is decimal. 



distribution for our 1107 moves labeled from the most to 
the least frequent. The integrated distribution of moves is 
very similar for all databases and clearly follows a Zipf's 
law, with an exponent « 1.06. In contrast, such a law 
cannot be seen if one simply takes the 361 possible posi- 
tions (/i, v) as vertices, disregarding local features (inset 
of Fig.[T]). Thus Zipf's law appears when tactical informa- 
tion is taken into account. For all databases the 10 most 
frequent moves are the same (see Fig. [5] upper line), but 
sometimes in a slightly different order. 




logn 

Fig. 2: (Color online) Integrated frequency distribution of se- 
quences of moves /(n) for (from top to bottom) two to seven 
successive moves (all databases together), plotted against the 
ranks of the moves. Moves are the 1107 moves described in the 
text. Slopes are resp. -1.01, -0.91, -0.86, -0.83, -0.81, -0.77. 

The go network. — The dynamics of the game is 
built from successive moves. In the game of go, a game 
often consists in a series of small fights played at differ- 
ent places on the board. A player might put a stone in 
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Fig. 3: (Color online) Integrated frequency distribution of se- 
quences of moves f{n) for sequences of two (continuous lines) 
and three (dashed lines) successive moves for (from bottom to 
top) case CI (black, slopes -0.23, -0.4), C2 (red, slopes -0.25, 
-0.39), same curves as in the main panel (blue), C3 (green, 
-0.91, -0.70). Inset: distribution of distances between moves 
P(d); same color code as in Fig. 1. The four professional 
tournaments are almost undistiguinshable. 



the vicinity of their opponent's stones to engage the bat- 
tle, but the opponent might prefer to first continue a fight 
occurring somewhere else, in which case two consecutive 
moves would not be directly related. In order to construct 
our network, it is thus natural to connect two moves by 
a directed link only if these moves follow each other in 
the same region of the board. To be more specific, we 
connect vertices corresponding to moves a and b played at 
(ha^Va) and {h(,,Vb) respectively if b follows a in a game 
and max{ I /it,— /la I, \vb—Va\} < d. Each choice of the integer 
d defines a different network. The choice of d determines 
the distance beyond which two moves are considered non- 
related. We present in Fig. [2] the frequency distribution 
for sequences of moves defined in such a way with d — 4. 
Wc observe an algebraic decrease with exponent ranging 
from « 1 for short sequences to w 0.8 for longer sequences. 
Thus sequences of moves follow Zipf 's law, as was observed 
for word sequences in languages [11] . We attribute the de- 
crease of the exponent to the fact that longer sequences 
reflect more and more individual strategies. As a compari- 
son, Fig.Odisplays the frequency distribution of successive 
moves for three other definitions of moves and sequences 
of moves. In cases CI and C2, moves are defined as posi- 
tions (/i, v) on the board, disregarding local features, and 
b is considered to follow a if 6 is played immediately after a 
(case CI) or if b is the first move played after a and in the 
vicinity of a, with d = A (case C2); in case C3 sequences 
of vectors between two moves played in the same region 
(with d = A) are considered. These results indicate that 
move sequences, even long ones, are best hierarchized by 
our initial definition. In what follows we will thus disre- 
gard other choices C1-C3. In the inset of Fig. [31 we also 
show the distribution P{d) of distances between consec- 



utive moves. Interestingly enough, the amateur database 
departs significantly from all the professional ones, with 
a tendency to play more often at shorter distances. This 
may reflect the fact that professionals are more prone to 
play several tactical fights in parallel, or play on average 
shorter local tactical fights. 

We now investigate the properties of our networks. We 
construct a network for each database by playing the 
games according to the rules of go and adding directed 
links between the 1107 vertices as indicated above. To 
each link is assigned a weight given by the number of oc- 
currences in the database. 
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Fig. 4: (Color online) Normalized integrated distribution of 
ingoing links P-^-^ (lower curves, solid) and outgoing links Pgut 
(upper curves, dashed), for networks built with d = 4. The 
number of vertices with more than k ingoing (outgoing) links 
is shown vs the normalized number of ingoing (outgoing) links 
fc/fcmax- Same databases and color code as in Fig. [T] Thick 
solid line isy = —x. Inset: Pjj^ (solid curves) and Pgut (dashed 
curves) for the Honinbo database, d = 2 (black), 3 (red), 4 
(green), 5 (blue) and 6 (violet). 



The distribution of ingoing and outgoing links Pj^^ and 
Pout is displayed in Fig. The tails of both distribu- 
tions are very close to a power-law 1/k"' with exponent 
7 = 1.0 for the integrated distribution. The results are 
stable in the sense that the exponent does not depend on 
the database considered. The presence of such power laws 
indicates that the network displays the scale- free property: 
the distribution of links around a given link frequency is 
independent of that frequency. Such a property has been 
seen in many social or biological networks, but is absent in 
e.g. the famous Erdos-Renyi model of random networks. 
The symmetry between ingoing and outgoing links is a 
peculiarity of this network; it is well known for the World 
Wide Web (WWW) for example that the exponent for 
Pout (7 ~ ^■'^) is much larger than for Pjj^ (7 1.1) [17]. 
In the case of the WWW, the number of outgoing links is 
limited by the behavior of each independent webmaster. 
In our case, the results indicate a symmetry, at least at 
a statistical level, between moves that often follow oth- 
ers and moves which have many possible following moves. 
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This may correspond to the fact that many short tactical 
sequences can be played in a different order within several 
different contexts. In order to analyze the dependence of 
Pjj^ and i^out ^^e choice of d, we plot these quantities 
for a network constructed for various values of d in the 
inset of Fig. 21 The distribution of ingoing and outgoing 
links stabilizes at d = 4. Other databases give similar 
results (data not shown). We now focus on d = 4. 

Some properties can be extracted from the unweighted 
adjacency matrix (i. e. without weighing the links accord- 
ing to their frequency). The clustering coefficient (CC) de- 
scribes the tendency of many real- world networks to form 
local clusters of highly connected vertices. The CC of a 
given vertex i is defined as the probability that two neigh- 
bors of i be connected to each other, irrespective of the 
direction of the link. The average CC for our networks is 
displayed in Fig. \E\ (inset). The CC depends on the num- 
ber of games ng included to construct the network, but 
almost not on the database. For large Ug, the CC goes to 
an asymptotic value which appears to be larger than 0.7, 
indicating high clustering, larger than the WWW (where 
the CC is w 0.11 [T]) and comparable to social [1] or lan- 
guage networks [TTl. 
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Fig. 5: (Color online) Ranking vectors for matrices G with 
a = 1. Same color code as in Fig. [T] d = 4. From top to 
bottom, top bundle: PageRank. Second bundle: CheiRank. 
Third bundle: Hubs. Fourth (bottom) bundle: Authorities. 
Straight dashed line is y = —x. Inset: Clustering coefficient 
(CC) as a function of the number of games rig included to 
construct the network; blue squares: professional tournaments, 
all games included; circles: amateur games. 

Ranking vectors for the go network. In order to 
get an insight into how our network captures aspects of the 
strategy of the game, we now use the weighted adjacency 
matrix (links are weighted according to their frequency 
in the database) and apply tools developed to rank ver- 
tices in order of importance, used e.g. to determine the 
order of appearance of answers to queries by search en- 
gines. These methods generally build a ranking vector, 
whose value on each vertex enables to determine its im- 



portance. The most famous such vector is the PageRank 
vector [IHIIH], which was the basis of the Google search 
engine. It is built from the Google matrix G, defined as 
G^j = a5„ +(l-a)*ee/A^, where e = (1,...,1), N = 1107, 
< a < 1; 5* is the weighted adjacency matrix with any 
column of replaced by a column of 1, and the sum of 
each column normalized to 1 . The PageRank vector is the 
right eigenvector of G associated with the largest eigen- 
value A = 1, and singles out vertices with many incoming 
links from important nodes. From its definition, its com- 
ponents are real and nonnegative, and therefore can be 
used to rank nodes according to the value of these com- 
ponents. Other ranking vectors built from G include the 
CheiRank vector (which is the PageRank of the net- 
work with all links inverted), and the Hubs and Authori- 
ties of the HITS algorithm They all share the prop- 
erties of being real nonnegative vectors, and thus can be 
used to rank the nodes of the network. While PageRanks 
and Hubs reflect properties of vertices depending on their 
incoming links, CheiRanks and Authorities are based on 
outgoing links. In Fig. [S] we show these ranking vectors 
for our networks. They follow an algebraic law with slope 
« — 1 (PageRank and CheiRank) and « — 1.5 (Hubs and 
Authorities) . A similar distribution of the PageRank was 
observed in e.g. the WWW [T71I22] . but in contrast with 
the WWW and other systems there is a marked symmetry 
between distributions of ranking vectors based on ingoing 
links and those of vectors based on outgoing links. 




500 1000 

K 



Fig. 6: (Color online) K* vs K where K (resp. K*) is the rank 
of a vertex when ordered according to PageRank vector (resp 
CheiRank) for Honinbo (black squares), Meijin (red circles), 
Judan (green diamonds) , Kisei (blue crosses) and amateur (vi- 
olet stars) databases. 

In order to further shed light on this symmetry, we plot 
in Fig. \6\ the correlation between the PageRank and the 
CheiRank for the five databases considered. In all these 
cases, there is a remarkably strong correlation between 
these rankings based respectively upon ingoing and out- 
going links. In the WWW, there is a difference of nature 
between ingoing and outgoing links: webmasters are free 
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to create as many outgoing links as they wish from their 
webpage, whereas the ingoing links depend on the cumu- 
lative effect of all other webmaster behaviors. In contrast, 
for the go network, the fact that a link is ingoing or outgo- 
ing depends on the chronological order in which the moves 
are played. The results displayed in Fig. [5] thus seem to 
indicate that there is a strong correlation between moves 
which open many possibilities of new moves and moves 
that can follow many other moves. However, the symme- 
try is far from exact, as can be seen in Fig. |6l 

Although there is always some correlation between the 
different ranking vectors, they usually can be quite dif- 
ferent, for example in Wikipedia articles [5D]. A recent 
analysis of the world trade network |23j showed such a 
symmetry when all commodities were aggregated, but the 
symmetry was much less visible when each different com- 
modity was treated separately. It is possible that in our 
case the symmetry is made more prominent by our defi- 
nition of moves through 3x3 plaquettes. A more refined 
approach with larger plaquettes may thus disambiguate 
the moves and give different results. We nevertheless think 
that our result indicate a specific feature of the game, such 
as the existence of many short sequences of tactical moves 
which can be played at different moments of the game. 
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Fig. 7: (Color online) Top left: eigenvahies in the complex 
plane for matrices G, d — 4, a = 1; black circles: Honinbo; red 
crosses: amateur. Bottom: Ac such that from top to bottom 
99%, 95%, 90%, 80% of eigenvalues A verify |A| < A,, for am- 
ateur games. Top right: Ac for 80% of eigenvalues for our 5 
databases, same color code as in Fig. [1] 

Eigenvectors of the Google matrix. As can be 
seen in Fig. [SJ the ranking vectors arc distributed accord- 
ing to power laws and thus are mainly localized on few 
vertices, mainly the most frequent ones according to Zipf 's 
law (see e. g. Fig. [9l upper line). However, these ranking 
vectors correspond to the eigenvector associated with the 
largest eigenvalue of different matrices built from the net- 
work. We now consider the other eigenvectors of G. In 
particular, the eigenvectors associated with next to lead- 



ing eigenvalues can describe specific communities inside 
the network The distribution of eigenvalues is also 

important, as it refiects the structure of the network [52]. 
Fig. [7] shows the complex eigenvalues of the matrix G. 
For the WWW a sizable fraction of eigenvalues are close 
to zero, while the remaining ones are spread inside the 
unit circle, with no gap between the first eigenvalue and 
the bulk [35]. By contrast, in the case of the go network, 
there is a huge gap between the first eigenvalue A = 1 and 
the next ones. This is reminiscent of what can be seen in 
some lexical networks [21] . Such features indicate that the 
network is well-connected, with few isolated communities, 
and is consistent with the finding of a high clustering coef- 
ficient (see inset of Fig. [S]). Whereas the WWW contains 
many communities of webpages which are almost cut off 
from the rest of the network, this is not the case for the 
go network, where communities ~ i.e. sequences of tactical 
moves preferentially played together - have more connec- 
tions to the rest of the network, indicating that tactical 
moves can belong to different strategic groupings. To put 
these data in perspective, we have constructed a random 
version of the network, by randomly shuffiing the moves 
inside each game of the databases. This process conserves 
Zipf's law and the global characteristics, but eigenvalues 
of G are all concentrated in the bulk (data not shown) , in 
contrast with the real go network, indicating that commu- 
nities are destroyed by the randomization process. 
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Fig. 8: (Color online) Moduli squared of the right eigenvectors 
associated with the 7 largest eigenvalues |Ai| = 1 > |A2|... > 
At I of G (Honinbo database) for the first 100 moves in decreas- 
ing frequency (|Ai| (PageRank): thick black line, |A2|: orange 
pluses; lAsI — |A4|: red circles; [As]: green squares; \Xe\: blue 
triangles; |A7|: violet stars). Inset: Same for amateur database 
(full black line) and random network (dashed red line, see text). 



In Fig. [7] (bottom), it is shown that the radius of the 
bulk of eigenvalues changes with the number of games 
Hg entered in the network. This indicates that as more 
games are taken into account, rare links appear which 
break more and more the weakly coupled communities. 
The next to leading eigenvalues are important, as they in- 
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Fig. 9: Moves corresponding to the 10 largest entries of right 
eigenvectors of G for eigenvalues Ai (PageRank)(top), A3 (mid- 
dle) and A7 (bottom), Honinbo database. Black is playing at 
the cross. Top line coincides with the 10 most frequent moves. 



dicate the presence of communities of moves which have 
common features. The distribution of the first 7 eigen- 
vectors (Fig. IS]) shows that they are concentrated on par- 
ticular sets of moves different for each vector. The cor- 
responding moves are displayed in Fig. [5] for the Honinbo 
database. The first eigenvector is mainly localized on the 
most frequent moves. By contrast, the third one is lo- 
calized on moves describing captures of the opponent's 
stones, and part of them single out the well-known sit- 
uation of ko ("eternity"), where players repeat captures 
alternately. The 7th eigenvector singles out moves which 
appear to protect an isolated stone by connecting it with a 
chain. These eigenvectors are different for different tour- 
naments and from professional to amateur. Indeed, the 
inset of Fig. [5] shows the distribution of the first seven 
eigenvectors for amateur database, very different from the 
one for Honinbo. It also shows for comparison the dis- 
tribution for the randomized network (see above), which 
is much less peaked. Systematic studies of these eigenvec- 
tors, as well as the frequency of sequences of moves, should 
enable to group together certain moves, and should help 
to elaborate efficient go simulators. 

Conclusion. — In this paper, we have studied the 
game of go, one of the most ancient and complex board 
games, from a complex network perspective. We have de- 
fined a proper categorization of moves taking into account 
the local environment, and shown that in this case Zipf 's 
law emerges from data taken from different tournaments. 
The network of go moves has some peculiarities, such as a 
statistical symmetry between ingoing and outgoing links 
distributions, which reflects itself in a symmetry between 
rankings based on ingoing on outgoing links, a feature not 
seen in many other complex directed networks such as 
the WWW. Differences between professional tournaments 
and amateur games can be seen. Properties of eigenval- 
ues and eigenvectors of the matrices producing ranking 
vectors vary between amateur and different professional 
tournaments. Certain eigenvectors are localized on spe- 
cific groups of moves which correspond to different strate- 
gies. We think that the point of view developed in this 
paper should allow to better modelize such games and 
could also help to design simulators which could in the 
future beat good human players. Our approach could be 
used for other types of games, and in parallel shed light 



on the human decision making process. 
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